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Abstract. Protein function frequently involves conformational changes with large 
amplitude on time scales which are difficult and computationally expensive to access 
using molecular dynamics. In this paper we report on the combination of three 
computationally inexpensive simulation methods — normal mode analysis using the 
elastic network model, rigidity analysis using the pebble game algorithm, and geometric 
simulation of protein motion — to explore conformational change along normal 
mode eigenvectors. Using a combination of ElNemo and First/Froda software, 
large-amplitude motions in proteins with hundreds or thousands of residues can be 
rapidly explored within minutes using desktop computing resources. We apply the 
method to a representative set of 6 proteins covering a range of sizes and structural 
characteristics and show that the method identifies specific types of motion in each 
case and determines their amplitude limits. 
Revision : 1.77, compiled 27 January 2012 
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1. Introduction 

Protein conformational changes and dynamic behaviour are fundamental for processes 
such as catalysis, regulation, and substrate recognition. The time scales of motions 
involved in enzyme function span multiple orders of magnitude, from the picosecond 
timescale of local side groups rotations to the milli-/microsecond timescale of the motion 
of entire domains fil. Empirical-potential molecular dynamics (MD) has proved to be a 
valuable tool for investigating molecular motions, but specialized expertise, large-scale 
computing resources and weeks or months of compute time are required to explore 
protein motion on simulation time scales greater than tens of nanoseconds p] . There is 
clearly a need for methods that that permit exploration of possible large conformational 
motions of proteins in a rational, albeit somewhat simplified, fashion with minimal 
computational resources. Such explorations allow for the generation of hypotheses about 
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large conformational motion and protein function that can then be investigated with 
detailed MD simulations and by experimental methods such as FRET (3j. 

The computational cost of simulations can be reduced by coarse-graining (CG) 
— averaging out atomic degrees of freedom so as to represent groups of atoms by a 
single site j4|[5) — and/or by simplifying the intersite interactions. Different levels 
of simplification can then be combined in multi-scale methods (6|[T| . Here, we shall 
consider three such methods in particular. Pebble-game rigidity analysis, implemented 
in First (8j, provides valuable information on the distribution of rigid and flexible 



regions in a structure 9 . Geometric simulation in the Froda algorithm 10 uses rigidity 



information and explores flexible motion [TT, 12 . Normal mode analysis of a coarse- 



grained elastic network model (ENM), implemented in ElNemo 13, 14], generates 



eigenvectors for low-frequency motion which are potential sources of functional motion 



and conformational change 15 -19]. 

Our approach in this study is to bias the generation of new conformations in 
First/Froda along an eigenvector predicted by ElNemo as a low- frequency mode. 
The bias directs the motion of the atoms in the direction of the eigenvector while 
the geometric constraint system maintains rational bonding and sterics and prevents 
the build-up of distortions that occur in a linear projection. The method is outlined 
schematically in Figure [T] We apply our method to a set of six proteins of various 
sizes from 58 to 1605 residues. We find that flexible motion in an all-atom model can 
be explored to large amplitudes in a few CPU-minutes, until further motion is limited 
by bonding or steric constraints, representing the calculated limit of motion along that 
vector. 

2. Methods 

2.1. Protein selection 

We deliberately selected six proteins for analysis that are diverse in function, structural 
characteristics and size, ranging from 58 to 1605 residues. For each protein, we selected 
a representative high-resolution structure from the Protein Data Bank (PDB) (20j. The 
proteins and their PDB codes are listed in Table [T] and their structures are shown in 
Figure [2] with colour coding according to the results of rigidity analysis. 

Bovine pancreatic trypsin inhibitor (BPTI) is a small well-studied protease inhibitor 
of 58 amino acids, comprising mainly random-coil structure plus two antiparallel /?- 
strands and two short a-helices; the protein has only a small hydrophobic core, 



but is additionally stabilized by 3 intra-chain disulphide bonds 21,22 . Mammalian 
mitochondrial cytochrome-c is a classic electron-transfer protein containing a redox- 
active haem group bound within a primarily a-helical protein fold. These two were 
selected as contrasting small proteins. 

As medium size proteins we selected ad-antitrypsin and the core catalytic domain of 
the motor protein kinesin |23| . The former is a protease inhibitor of the serpin family [24] 
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which operates via a 'bait' mechanism comparable to that of a mouse-trap, involving a 
very significant conformational change, whereas the latter is a mechanochemical device 
that transduces the chemical energy of ATP hydrolysis into mechanical work, specifically 
the depolymerisation of microtubules in the case of this kinl kinesin. Both these proteins 
comprise an extensive /5-sheet core flanked by several a-helices. 

Protein disulphide-isomerase (PDI) is a large protein (with more than 500 residues) 
comprising 4 distinct domains each with a thioredoxin-like fold, connected by two short 
and one longer linker 125) ; the protein has both redox and molecular chaperone activity 
and intramolecular flexibility is essential for its action in facilitating oxidative folding 



of secretory proteins [26, 27 . The largest protein selected is an integral membrane 
protein (a bacterial protein of 1605 residues) that operates as a pentameric ligand-gated 
ion channel (pLGIC); it comprises an extracellular — mainly /3-sheet — domain and 
a membrane-embedded domain, mainly comprising a-helices which form the lining of 
the ion-channel; the mechanisms of ion permeation and channel gating are not yet 
completely understood but it is clear that a conformational change is required for 
function [28] . 

2.2. Rigidity analysis and energy cutoff selection 

We add the hydrogen atoms absent from the PDB X-ray crystal structures using the 
software Reduce |29| and remove alternate conformations and renumber the hydrogen 
atoms in Pymol |30|. This produces usable files for First rigidity analysis. For 
each protein we produce a "rigidity dilution" or rigid cluster decomposition (RCD) (8) 



plot (displayed in the supplementary Figure S2). The plots show the dependence of 
the protein rigidity on an energy cutoff parameter, £7 cu t, which determines the set of 
hydrogen bonds to be included in the rigidity analysis. The tertiary structures with the 



residues coloured by the rigid clusters they belong to are shown in Figures [S3HS8] for 
each of the selected energy cutoffs. 

Previous studies [8] suggested that E cut should be at least —0.1 kcal/mol in order 
to eliminate a large number of very weak hydrogen bonds, and that a natural choice is 
near the 'room temperature' energy of —0.6 kcal/mol. We have recently discussed the 
criteria for a robust selection of E cut M . For each protein we have selected several energy 
cutoffs at which to explore flexible motion, as listed in Table [T] A higher cutoff energy 
increases the number of constraints included in the simulation, and this is expected to 
restrict protein motion. We have used in each case at least one cutoff at which the 
protein is largely rigid (in the range —0.1 kcal/mol to —0.7 kcal/mol) and at least one 
lower cutoff at which the protein is largely flexible (in the range —0.5 kcal/mol to —2.2 
kcal/mol). 

2.3. Normal modes of motion 

We obtain the normal modes of motion using the ENM [3TJ implemented in ElNemo 
software 13 ,14]. This generates, for each protein, a set of eigenvectors and associated 
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eigenvalues. Other implementations of elastic network models are also available, for 
example the AD-ENM of Zheng et al. [32]. 

The low-frequency modes are expected to have the largest amplitudes and thus be 
most significant for large conformational changes. However, the six lowest-frequency 
modes (modes 1 to 6) are trivial combinations of rigid-body translations and rotations 
of the entire protein. For illustration, here we consider the five lowest-frequency non- 
trivial modes, that is modes 7 to 11 for each protein. We will denote these modes as 
ra 7 , ra 8 , . . . , mn. 

The mode eigenvectors are predicted on the basis of a single protein conformation. 
The amplitude to which a mode can be projected may be limited by bonding and/or 
steric constraints that are not evident in the input structure or fully captured by the 
ENM. A linear projection of all the residues in the protein along a mode eigenvector 
introduces unphysical distortions of the interatomic bonding. Typically, to avoid this 
and project a mode to finite amplitude requires one or more cycles of a combined 
method; the mode is projected linearly until distortions become evident and the resulting 



structure is relaxed using constrained MD/molecular mechanics 33 -35 . We explore an 
alternative method for projection of modes to large amplitudes, using rigidity analysis 
and geometric simulation. 

2.4- Froda mobility simulation 



Geometric simulation, implemented in the Froda module within First [10JJ36J, explores 
the flexible motion available to a protein with a given pattern of rigidity and flexibility. 
New conformations are generated by applying a small random perturbation to all atomic 
positions; Froda then reapplies bonding and steric constraints to produce an acceptable 
new conformation. Motion can be biased by including a directed component to the 
perturbation. The capability to use a mode eigenvector as a bias was implemented in 
First/Froda by one of us (SAW) and has been briefly reported previously [37]. The 
combination of ElNemo and First/Froda, illustrated schematically in Figure [TJ is 
described in detail in supplementary material (section |Sl| ) . 

Since the displacement from one conformation to the next is small, we record 
only every 100th conformation and continue the run for typically several thousand 
conformations. The run is considered complete when no further projection along 
the mode eigenvector is possible (due to steric clashes or bonding constraints) which 
manifests itself in slow generation of new conformations and poor reproducibility in the 
results of independent runs. We have performed Froda mobility simulation for each 
protein at several selected values of E cut , see section |2.2[ 

During conformation generation we track the fitted RMSD between a carbons 
of the initial and current conformation. This measure is discussed in more detail in 



supplementary material (section 2.5). To project a mode to an amplitude of several A 
in fitted RMSD typically takes a few CPU-minutes on a single processor. We carry out 
five parallel simulations for each structure, mode and direction of motion and monitor 
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the evolution of fitted RMSD during each run, as illustrated in Figures |3}(6j 
2.5. Raw and fitted RMSD 

The RMSDs reported in Table [Si] are a carbon RMSDs from the input structure 



to a generated conformation, obtained after least-squares fitting using the PyMOL 
intra_f it command. These values differ somewhat from the raw RMSD values reported 
by Froda in its output files, which are calculated without any fitting being carried out. 
In particular, the fitted RMSD saturates once further motion along the mode direction 
is no longer possible, due to steric clashes or limits imposed by covalent or noncovalent 
bonding constraints. The raw RMSD is greater than the fitted RMSD and tends not 
to saturate, but rather to continue to increase slowly, once the motion is effectively 
jammed. The reason for this different behaviour is a small difference in the statistical 
weighting given to each residue by ElNemo and by First/Froda. In the elastic 
network modelling, every residue is given equal statistical weight. The non-trivial mode 
eigenvectors thus generated have no significant component of rigid-body motion for the 
whole structure. In Froda, however, the bias is applied to an all-atom representation of 
the structure; and thus the bias applied to a residue with many atoms affects the whole- 
body motion of the structure more than the bias applied to a residue with few atoms. 
The motion in Froda therefore acquires a small component of rigid-body translation 
and rotation, which increases the raw RMSD. Least-squares fitting to the input structure 
removes the rigid-body components, so the fitted RMSD detects actual conformational 
change. 

The effects of fitting are illustrated in Figure [STfc for mode 7717 of structure 
1BPI. The raw RMSD values increase almost linearly during the generation of 10000 
conformers, whereas the fitted RMSD values saturate for conformers from « 5000 up to 
10000. Conformers 5000 and 10000 differ by « 3A in raw RMSD but by only « 0.8A 
in fitted RMSD. Superpositions of conformers 0, 5000 and 10000 with and without 
fitting, shown in Figures |ST)3,c, show that conformers 5000 and 10000 are indeed very 
similar to each other. Hence, fitting structures before calculating RMSD values allows 
us to identify real conformation change and remove the component of rigid-body motion 
introduced by Froda. 

3. Results 

3.1. Conformer generation with mode bias 

The output of the mobility simulations for BPTI (1BP1) is summarized in Figures [3^i-c. 
The evolution of RMSD for each of 777,7, • • • ,^11, during runs at two selected values of 
E'cutj is represented in Figures^ and[3]3 respectively. In all cases, we observe an initial 
phase in which the RMSD increases almost linearly, as the protein explores the mode 
direction without encountering significant steric or bonding constraints on the motion. 
During this phase, generation of new conformations in Froda is very rapid and the 
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RMSDs from different runs are very similar to each other. The RMSD then displays 
an inflection, ceasing to rise linearly, and approaching an asymptote; this indicates that 
steric clashes and bonding constraints (such as hydrophobic tethers) are preventing 
further exploration along the mode direction. The asymptote is thus an amplitude limit 
on the mode. In this phase the generation of new conformations in Froda becomes 
slower as the fitting algorithm has increasing difficulty finding a valid conformation, 
and the RMSDs achieved by different runs differ. In the regime of slow conformation 
generation, the mode bias is forcing the structure into a regime of steric clashes and/or 
of bonding constraint limits, for example when residues connected by a hydrophobic 
tether are being pushed apart. This regime cannot correspond to the low-frequency 
flexible motion which we wish to explore. Our presentation of RMSD data for larger 
proteins is therefore truncated once this "jamming" starts. For most of our proteins, 
2500 conformations is sufficient to cover the regime of rapid conformation generation. 
For our largest protein, pLGIC (2VL0), we find that 1000 conformations was sufficient. 

3.2. Small loop motion 

In Figure |3^l we show an ensemble of structures for BPTI generated by exploring the 
lowest-frequency nontrivial mode, 777,7, an d in Figures [3)3, c we show the RMSDs achieved 
for the five lowest-frequency nontrivial modes. The amplitudes of flexible motion for 
BPTI at the cutoffs —0.2 kcal/mol and —2.2 kcal/mol reach an asymptote at RMSD 
values around 2 to 4A. Considerable asymmetry, a factor of 2 difference in the achievable 
RMSD, is observable in some modes between the two possible directions of motion. Let 
us emphasize here that in general the amplitudes of flexible motion are not available 
from either the elastic network model or the rigidity analysis without simulation of 
flexible motion. 

BPTI (1BPI) is a small protein with some relatively flexible loop regions. For 
contrast, we now examine a compact globular protein, cytochrome-c (1HRC). RMSD 
results are shown in Figure [3^,f truncated once jamming had begun. Although this 
protein is twice as large as BPTI in terms of residues, its capacity for flexible motion is 
visibly more limited, with amplitudes below 2 A in all modes. This result is important in 
validating our method of projecting modes to large amplitude; if geometric simulation 
were capable of reaching unphysically large amplitudes, the method would lose its value. 

3.3. Large loop motion 

RMSDs for low- frequency modes of internal kinesin motor domain protein (1RY6) are 
shown in Figure [p), c for two different energy cutoffs. The amplitudes achievable for 
these modes differ little between the different energy cutoffs; motion occurs principally 
in a flexible loop region around residues 37-46. An exploration of 777,7 is presented in 
Figure [4^l for an energy cutoff of —1.1 kcal/mol, clearly showing the loop motion. We 
find that the combination of the mode bias and the bonding constraints naturally causes 
the large flexible loop to follow a curved trajectory. 
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As shown in Figures [i^, f, al-antitrypsin (1QLP) displays several low-frequency 
modes which easily explore amplitudes of up to 2-2. 5 A RMSD depending on the rigidity 
cutoff. The motion shown in Figure [4]l again involves the easy motion of large flexible 
loops with respect to the relatively rigid /3-sheet core of the protein. 

3.4- Domain motion 

Protein disulphide isomerase (2B5E) is an interesting case for protein mobility. The 
protein consists of four domains (a-b-b'-a') connected by flexible linkers. Biological 
evidence indicates that conformational flexibility is vital to the function of the enzyme 
[25] . Rigidity analysis immediately brings out the flexibility of the molecule (see Figure 
|S2^). Even at very high E cut values (E cut = —0.015 kcal/mol), the rigidity analysis 
reveals the domain organisation of the protein with each domain corresponding to 
a distinct rigid cluster flanked by flexible linkers. The RMSD achievable by low- 
frequency flexible motion therefore does not depend significantly on the energy cutoff; 
motion is slightly limited at the weakest cutoff (Figure (5^), but at other cutoffs the 
achievable amplitudes are essentially the same (Figure^ and d). Close examination 
of the conformation generation in First/Froda indicates that the amplitudes are 
limited eventually as further motion along the mode would over-extend covalent and 
hydrophobic-t ether constraints. 

The inter-domain nature of the flexible motion is detailed in Figure^ which shows 
structures at the amplitude limits of 7717 in the positive and negative directions. The 
structures are aligned on the b~b' domains, bringing out the motion of the a domain and 
particularly of the a' domain. The CPU time required to project this protein with more 
than 500 residues along 7717 to an amplitude of more than 20 A is less than 15 minutes. 

The largest protein we have investigated is pLGIC (2VL0). Its pentameric structure 
includes a transmembrane domain composed of a-helices and an extracellular domain 
consisting largely of /3-sheets. The major rigidity transition in the protein identified by 
First occurs between a cutoff of —0.4 kcal/mol, when almost the entire structure forms 
a single rigid cluster, and —0.5 kcal/mol, when the five backbone sections linking the 
two major domains have become flexible and the many a-helices in the transmembrane 
are mutually flexible. At the lower energy cutoff it is possible for the two domains to 
move relative to each other, and the transmembrane helices are also capable of relative 
motion. The increased RMSD possible at the lower E cut is visible in Figure [HJ 

The flexible motion at the higher cutoff, with the protein largely rigid, involves only 
the motion of a few flexible loops. The motion at the lower cutoff is far more biologically 
interesting. The lowest-frequency non-trivial mode, 7717, involves a counter-rotation of 
the transmembrane and extracellular domains, including a change in the relative tilt of 
transmembrane helices lining the ion channel. This flexible motion is shown in Figure 
[6^1, b. The CPU time to project this large protein with more than 1600 residues along 
my to its amplitude limit is less than twenty minutes. 
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4. Discussion 

4.1. Extensive RMSD as a characterisation of total flexible motion 

The evolution of RMSD values is displayed in Figures [3H6] and the maximum values 



achieved by these motions, which range from 1.5 A to 10 A, are shown in Table SI 
However, the character of the flexible motion does not seem well reflected by the RMSD 
values. For example, a small protein of 58 residues without a large conformational 
change such as BPTI shows RMSD values of up to 3.5A in its small loop motion (Figure 
[3]); whereas the channel protein pLGIC, a thirty times larger protein by residue count 
(1605 residues), shows a substantial domain motion, in which a large proportion of 
the atoms undergo relative motion as the transmembrane protein and extra cellular 
domains counter-rotate (see Figure [6^1, b). Yet the channel protein pLGIC shows 
maximum RMSD values of around 2. 5 A only. So, although RMSD is a good measure 
for comparing two similar structures, it does not necessarily capture the scale of motion 
in different structures. 

We introduce an extensive RMSD measure by multiplying the RMSD (which 
describes the average displacement of atoms) by the number of residues in the protein. 
Figure [7] shows these xRMSD values for all the selected proteins moving along The 
three categories of motion which we have discussed — small loop motion, large loop 
motion and domain motion — become clearly visible in xRMSD. The xRMSD results 
for BPTI and cytochrome-c closely resemble each other even though cytochrome-c is 
almost double the size of BPTI. Similarly, the kinesin protein and the ad-antitrypsin 
display similar xRMSD behaviour to each other in their large loop motion. PDI and the 
pLGIC likewise have similar xRMSD behaviour reflecting their domain motion. Thus 
the character and extent of the flexible motions in proteins of various sizes, as shown in 
Figures [3^i,d, [4^i,d, [5^i and [6^1, b, is better reflected by the xRMSD than by the RMSD 
alone. 

4-2. Monitoring the evolution of normal modes 

It is implicit in normal-mode analysis of protein conformational change that a mode 
eigenvector should be a valid direction for motion over some non-zero amplitude. For 



example, Krebs et al. 38 have surveyed a large number of known conformational 
changes, using paired crystal structures, comparing the vector describing the observed 
conformational change to the low-frequency elastic network mode eigenvectors using a 
dot product. In many cases the observed change had a large dot product (> 0.5) with 
only one or two normal modes. 

X For the proteins with domain motion, we have chosen values of E cut which correspond to lower 
flexibility. The observed large variation in xRMSD is therefore taking place despite a restrictive bond 
network. On the other hand, for the proteins with loop motion we selected energy cutoffs which allow 
more structural flexibility. In this situation, although larger regions of the protein could become mobile, 
we still only observe localized loop motion. 
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In each of our simulations we use an initial normal mode, rrij , as a bias throughout 
the simulation. We calculate a new set of current normal modes, mf \ for each newly 
generated conformation. We compute the dot product of the bias vector, mf\ with 
mf \ that is, the current normal mode with the same mode number j, as in mf • mf . 
Graphs of these dot products are shown for all protein structures investigated here in 
the supplementary Figures |S9f|S14 



We find that three main classes of dot product behaviour emerge, shown 
schematically in Figure [8^. In motif 1, the dot product remains close to 1 throughout 
the simulation, indicating the initial and current modes remain very similar. In this case 
the motion of the protein is not introducing significant changes to the elastic network 
model and the mode eigenvector remains almost unchanged. In motif 2, there is a 
gradual decline in the dot product, of quadratic or cosine character; this suggests a 
gradual rotation of mf relative to mf. Perhaps the most interesting case is motif 
3, in which the dot product mf • mf collapses rapidly; this can occur at any point 
in the simulation, even if the RMSD between the initial and current conformations is 
small. Examples of these motifs can be observed for all our proteins. We find, e.g. a 



motif 1 behavior in mode 7717 for PDI (cp. Figure S13 3, c, d), a motif 2 behaviour in 



7717 for kinesin (cp. Figure S12| ) and motif 3 character in m^ for antitrypsin (cp. Figure 



Sll) as well as in BPTI (cp. Figure S9). Similar agreement with all motifs can be 
found for higher modes, although, as a general tendency, the smooth motifs 1 and 2 
become gradually less visible and the more rapid changes exemplified by motif 3 more 
pronounced. 

These sudden collapses do not indicate that the initial normal mode eigenvector 
has ceased to be a valid direction along which flexible motion is possible. Rather, 
the eigenvector has ceased to represent a single pure mode; mf now has significant 
overlap with multiple other modes mf . This mode mixing is illustrated, e.g., in Figure 
8^ for projection of cytochrome-c (1HRC) along mf at a cutoff energy E cut = —1.2 
kcal/mol. After projection in the positive direction over about 0.25A RMSD, mf has 
a large overlap with mf} and also with mff After a similar projection in the negative 
direction, mf has only a small overlap with mff and instead has some overlap with 
many low-frequency modes mf such as k = 8, 9, 10, 12, 13, 14. Thus, a pure mode 
calculated on one structure may be a mixture of multiple modes when calculated on a 
very similar conformation. 

These results clarify that while the dot products provide useful additional 
information about the stability of the initial modes during the simulated motion, they are 
not simply correlated with loop or domain motion in contradistinction to the xRMSD. 



4-3. Significance of rigidity- analysis energy cutoff 

In the case of small loop motion, it is clear that lowering the rigidity-analysis energy 
cutoff — thus making the structure more flexible — increases the amplitude of flexible 
motion, as one might expect. The simulation of flexible motion can thus add value 
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to rigidity analysis of protein structures by identifying the constraints that must be 
eliminated in order for two residues to become independently mobile. In the case of 
large domain motion, however, the most important criterion appears to be whether the 
domains are mutually rigid or not. 

We can see in the case of PDI (Figure [5]) that the amplitude of flexible motion for 
the lowest-frequency modes is almost unaffected by the choice of the energy cutoff (E cut ) 
provided it is set at a reasonable value of energy cutoff which represents each domain 
as a number of separate small rigid clusters. This conclusion can also be drawn in the 
case of the ligand-gated ion channel protein. 

5. Conclusions 

We have reported a hybrid method to explore protein motion by integrating both rigidity 
constraints from First and directional information, in the form of low-frequency elastic 
network mode eigenvectors obtained using Elnemo, into the geometric simulation 
method Froda. The exploration brings out features of the motion that could not 
be inferred using First or ElNemo alone. In order to illustrate the method, we have 
applied it here to a diverse selection of proteins whose flexible motion ranges from small 
loop motion (BPTI, cytochrome-c) and large loop motion (a kinesin and an antitrypsin) 
to large motions of entire domains (protein disulphide isomerase and a transmembrane 
pore protein) . Detailed studies of dynamics in relation to function of particular proteins 
are currently in progress (39|[40). The combined method can rapidly explore motion 
to large amplitudes in an all-atom model of the protein structure, maintaining steric 
exclusion and retaining the covalent and noncovalent bonding interactions present in the 
original structure. Significant amplitudes of motion are achieved with only CPU-minutes 
of computational effort even in a pentameric pore protein with more than 1600 residues. 
The amplitude of motion that can be achieved by flexible loops increases as the rigidity- 
analysis energy cutoff is lowered. For large-scale motion of domains, the most important 
criterion is that the energy cutoff should be low enough that different domains do not 
form a single rigid body. We note that RMSD, a measure of structural similarity, does 
not properly reflect the scale of flexible motion between different proteins; this is better 
captured by an extensive measure, xRMSD, which reflects both the size of the protein 
and the amplitude of its motion. Examination of the behaviour of the elastic network 
eigenvectors during the motion shows many examples of mode mixing, so that a given 
vector of motion can change from being a pure mode to a mixed one after quite small 
displacements, without losing its character as an "easy" direction for flexible motion. 

We believe that the ability to explore large amplitudes of flexible motion in an all- 
atom model with minimal computational resources will be of great use in biochemistry, 
structural biology, and biophysics, as a generator of new hypotheses and intuitions 
about protein structure and function, and as an ally to other simulation methods such 
as molecular dynamics, Monte Carlo folding simulations [4l] and ab-initio simulations. 
Such a study is currently in preparation for PDI [39) . 
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Table 1. The proteins and specific structures selected for this study. For each 
protein, various E cut values (kcal/mol) were chosen on the basis of rigidity analysis 
(Figure S2); the Table presents those values used in the simulations of motion that 
are presented in the main text. Bold values for E cut have been used to compute the 
xRMSD. Visualization of structures for additional cutoffs (see Table S2) are shown in 



the supplementary material, see Figures S3 - S8 



Protein 


PDB 


Resolution 


Residues 


E cut (kcal/mol) 


BPTI 


1BPI 


1.1A 


58 


-0.2, - 


2.2 


Cytochrome-c 


1HRC 


1.9A 


105 


-0.7, - 


1.2 


Kinesin 


1RY6 


1.6A 


360 


-0.4, - 


1.1 


ttl-antitrypsin 


1QLP 


2.0A 


394 


-0.1, - 


1.1 


PDI 


2B5E 


2.4A 


504 


-0.015, 


-0.522, -1.412 


pLGIC 


2VL0 


3.3A 


1605 


-0.4, - 


0.5 
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Figure 1. Schematic of the method. The input (at left) is an all- atom protein 
structure. Normal mode analysis (above) models the protein with a one-site-per- 
residue coarse graining and a simple spring model to produce an eigenvector for low- 
frequency motion. Rigidity analysis (below) identifies noncovalent interactions in an 
all-atom model of the protein and divides the protein into rigid clusters and flexible 
linkers. Geometric simulation (right) integrates normal-mode and rigidity information 
to explore the flexible motion of the protein. 
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Figure 2. Tertiary structure of all six protein structures (a) BPTI (1BPI), (b) 
cytochrome-c (1HRC), (c) od-antitrypsin (1QLP), (d) kinesin (1RY6,) (e) yeast PDI 
(2B5E) and (f) pLGIC (2VL0). The structures are given in standard Pymol [30] 



format but broken into rigid clusters according to the rigidity analysis (cp. Fig. S2) 
at the specific values of E cut shown in Table [I] Each rigid cluster is represented in 
a different colour with the largest rigid cluster indicated in blue and flexible regions 
shown in black. 
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Figure 3. Superimposed structural variations and fitted RMSD for small loop motion 
as found in BPTI and cytochrome-c. Panels (a) and (d) indicate the range of projected 
tertiary structure for motion along mode mj at E cut = — 2.2 kcal/mol for BPTI and at 
^cut = —1-2 kcal/mol for cytochrome-c, respectively. Panels (b,c) and (e,f) show the 
fitted RMSD as a function of Froda conformations for BPTI (1BPI) and cytochrome- 
c (1HRC), respectively for the non-trivial modes 777,7, . . . , ran at two values of E cut 
as shown. Positive conformation values indicate motion along the direction of the 
corresponding ElNemo mode, whereas negative conformation values indicate motion 
in the opposite direction. Points and error bars indicate mean and standard deviation 
obtained from five runs of the conformation generation for each mode. 
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Figure 4. Superimposed structural variation and fitted RMSD for large loop motion 
as in (a) kinesin (1RY6) and (d) antitrypsin (IQLP) for E cut = — 1.1 kcal/mol. Panels 
(b,c) and (e,f) represent — as in Figure [3] — the fitted RMSD at two values of E cut 
for kinesin and antitrypsin, respectively. Points and error bars have been determined 
as in Figure [3] 
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Yeast PDI 




Figure 5. Superimposed structural variation of large domain motion and fitted 
RMSD for yeast PDI (2B5E). (a) We show the initial tertiary structure as opaque and 
the projected structures as partially transparent. All structures are aligned on the 
central two domains b-b' to highlight the motion of the a and a' domains. Motion 
represents large conformational change along 777,7. Panels (b), (c) and (d) show the 
fitted RMSDs relative to the initial conformation for three values of ^ C ut- Points and 
error bars as in Figure [3] 
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pLGIC (topview) pLGIC (sideview) 




Figure 6. Large scale twist motion in a ligand gated ion channel (2VLO). (a) View 
down the transmembrane channel in its initial state and projected along mj in two 
directions, (b) Side view showing tilting of the helices during the motion. In both 
images the structures have been aligned on the extracellular /3-sheet portion so as to 
highlight the relative motion of the domains, and residues from number 283 upwards 
in each chain are not shown to make the major helices visible, (c and d) Fitted RMSDs 
relative to initial conformation for low-frequency modes mj, . . . , ran and two cutoff 
energies. Points and error bars as in Figure [3j 
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Figure 7. Extensive RMSD as a function of FRODA conformations for all six proteins 
moving along mode nij. The maximum xRMSD values range from 150A for BPTI to 
5243A for PDI. There are three clear categories of protein motion: large conformational 
changes achieved by domain motion (PDI and pLGIC), large loop motions (antitrypsin 
and kinesin) and small loop motions (BPTI and cytochrome-c). Theselected E cut for 
each protein are E^ t pl = -2.2kcal/mol, E^ c = -1.2 kcal/mol, ££* Y6 = -1.1 
kcal/mol, £^ t LP = -1.1 kcal/mol, E^ t 5E = -0.522 kcal/mol and E^ t L0 = -0.5 
kcal/mol. The XRMSD values obtained for mg, . . . , mil for the selected proteins are 
consistent with mj xRMSD. 
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Figure 8. Dot product motifs and illustration of mode mixing, (a) Schematic 
representation of the typical behaviours of the dot product of an initial mode with 
a current mode during projection along the initial mode eigenvector. Motif 1: gradual, 
nearly quadratic reduction in the dot product due to a progressive rotation of the 
current mode compared to the initial one. Eventual constant behaviour indicates that 
the motions has reached its amplitude limit. Motif 2: more rapid roughly quadratic 
reduction. Motif 3: sudden collapse of the dot product and the initial mode no 
longer resembles the current mode with the same mode number, (b) Dot products 

(i) 

computed for initial normal mode m\{ of cytochrome-c and current normal modes 
from rnf* to . Columns in the positive overlap direction denote projection in the 
positive direction. Columns in the negative overlap direction indicate projection into 
the opposite direction. The vertical dashed line marks mode and the horizontal 
dotted line indicates a dot product value. 
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Supplementary information 

SI. Settings, input files and command line options 

All simulations were carried out using the ElNemo and First software. Froda is a 
routine available within First. The file formats and command line options to apply a 
mode eigenvector as a bias in First/Froda have not previously been published in the 
literature, and we therefore describe them here. 

The required input for all calculations is a .pdb format file containing an all- 
atom representation of the protein structure including hydrogen atoms, which we shall 
name protein. pdb. Our usual procedure is to obtain a file containing the heavy atom 
positions from the Protein Data Bank; to remove alternate side chain conformations and 
nonbonded heteroatoms including water molecules; to add hydrogens using the Reduce 
software, including flipping of side chains where necessary; and to renumber the atoms 
sequentially using PyMOL. 

Normal mode calculations were carried out using ElneMo using the default setting 
in the pdbmat.dat input file of a 12A cutoff in the spring network. The protein 
structure is given as a pdbmat . structure file, consisting of only the a carbon lines 
from the all-atom structure. The output is a pdbmat . eigenfacs file which, for a 
protein of TV residues, contains 3N mode eigenvectors. Each eigenvector is described 
with a mode number, a frequency, and TV lines each giving a Cartesian vector; the zth 
line is the displacement to be applied to the zth residue. The vector is normalized so 
that the sum of the squares of all displacement vectors is unity. The first few lines of a 
mode appear thus: 

VECTOR 7 VALUE 6.0869E-04 



2.5267E-02 
1 . 7347E-02 
3.5897E-02 



2.1069E-02 
1.5141E-02 
2.5485E-02 



0.1020 

9.3303E-02 

9.2557E-02 



To pass this mode as a bias to First/Froda we prepare a mode . in file giving, for 
each residue, the identity of the a carbon atom for that residue in the pdb file, followed 
by the displacement vector. The first few lines of a mode . in file appear thus: 

2 0.025267 0.021069 0.102 
6 0.017347 0.015141 0.093303 
20 0.035897 0.025485 0.092557 

In Froda, the first displacement vector will be applied as a bias to the motion of all 
atoms in the first residue, and so forth. 

The command line options for First to carry out a Froda simulation with a 
mode bias can be given as follows: FIRST -non $PR0TEIN -E -$CUT -FRODA -mobRCl 
-freq $FREQ -totconf $T0TC0NF -modei -step $STEP -dstep $DSTEP 
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Taking terms in order, FIRST is the First executable; -non is to run in 
noninteractive mode; $PR0TEIN is the name of the all-atom .pdb format input file; 
-E -$CUT runs the rigidity analysis with hydrogen-bond cutoff energy -$CUT; FRODA 
invokes the Froda algorithm to simulate flexible motion; -mobRCl specifies that the 
largest rigid cluster is to be mobile, like the smaller rigid clusters, during the simulation; 
-f req $FREQ specifies the frequency with which newly generated conformations are to 
be written to file as new conformations; -totconf $T0TC0NF specifies the total number 
of new conformations to generate; modei specifies that mode, in should be read and 
applied as a bias; -step $STEP specifies the magnitude of random perturbations of 
the atomic positions in the generation of each new conformation; and -dstep $DSTEP 
specifies the magnitude of the displacement of the structure along the mode eigenvector 
in the generation of each new conformation. 

For all our calculations we have set $FREQ to 100 so as to save every 100th 
conformation; we have used a $STEP of 0.1 A; and we have used $DSTEP values of 0.01 A 
and -0.01A so as to project each mode in both possible directions. Thus to explore 
the motion of protein. pdb at an energy cutoff of —1.0 kcal/mol, we would use two 
command lines as follows 

FIRST -non protein. pdb -E -1.000 -FRODA -mobRCl -freq 100 -totconf 2500 
-modei -step 0.1 -dstep 0.01 

FIRST -non protein. pdb -E -1.000 -FRODA -mobRCl -freq 100 -totconf 2500 
-modei -step 0.1 -dstep -0.01 



S2. Raw vs fitted RMSD 



Figure [ST] shows the importance of using a fitted RMSD to identify and subtract correctly 
the rigid-body motion during a simulation. The raw RMSD values (in cyan) do not 
saturate, whereas the fitted RMSD values (in black) do. As presented in the figure, the 
evolution of the fitted and raw RMSD values shows that raw RMSD values continue 
to increase linearly, hence accounting for network internal motion as well as spatial 
rigid-body translation (cp. figure [Si})) , whereas fitted RMSD values saturate as they 
account for network internal motion only (cp. figure (Slfc) . The good overlap of the 



conformers 5000 (orange) and 10000 (green) as shown in figure SI: correspond to the 
only minimal increase in fitted RMSD observed in figure [STfc beyond conformer 5000. 
The two structures overlap each other along the polypeptide chain, which indicates that 
there is little motion between the two conformers. However, conformer 5000 and 10000 
have substantially moved with respect to the initial structure, i.e. conformer (black). 
Hence, it is clear from this comparison that we account for the rigid-body motion effect 
by fitting all the conformers to the initial structure. 





Supplementary figure SI. Raw vs fitted RMSD for BPTI. (a) Comparison between 
the RMSD values obtained before (raw RMSD) and after fitting the new conformers 
to the initial structure. We report the fitted and raw RMSD values of mode mj for 
every 100th conformer (only every 500th indicated by a symbol for clarity) and up 
to in total 10000 conformers at a cutoff energy of E cut = —2.2 kcal/mol. The error 
bars denote the standard deviation obtained from including 5 different initial random 
perturbations to the guided motion, see section |2.4| The fitted RMSD values, also 
shown in Figure [3^ for modes 777,7, • • • , mn, saturate whereas the raw RMSD values 
increase linearly, due to Froda weighting each residue by the number of atoms that it 
contains, thus producing a component of rigid body motion. Panels (b) and (c) show 
the superimposed structures for conformers (black), 5000 (orange) and 10000 (green) 
of (b) raw and (c) fitted structures. 
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Supplementary table SI. Extensive RMSD values, maximum RMSD values and 
the cutoff energies choosen to calculate xRMSD for each protein based on the RCD 
graphs. For proteins that are expected to be rigid we have choosen a higher E cut and 
for proteins with an expected conformational change we have choosen a more restrictive 

^cut- 



Protein 


Residues 




RMSD pos 


RMSD neg 


xRMSD pos 


xRMSD neg 






(kcal/mol) 


A 


A 


A 


A 


BPTI 


58 


-2.2 


2.62 


2.66 


152 


154 


Cytochrome-c 


105 


-1.2 


1.44 


1.40 


151 


146 


Kinesin 


360 


-1.1 


2.48 


2.04 


892 


733 


Antitrypsin 


394 


-1.1 


1.91 


2.04 


753 


804 


PDI 


504 


-0.522 


10.54 


10.40 


5314 


5243 


pLGIC 


1605 


-0.5 


2.56 


2.55 


4113 


4099 



Supplementary table S2. Rigidity analysis cutoff energies. Summary of all cutoff 
energies extracted from the rigidity analysis used in the geometric simulations. RMSD 
data for only a subset of these are shown in the main text. 



Protein PDB code Cutoffs (kcal/mol) 



BPTI 


1BPI 


-0.200, 


-1.700, 


-2.200 


Cytochrome-c 


1HRC 


-0.700, 


-1.200 




Kinesin 


1RY6 


-0.400, 


-0.600, 


-1.100 


Antitrypsin 


1QLP 


-0.100, 


-0.500, 


-1.100 


PDI 


2B5E 


-0.015, 


-0.522, 


-0.885, 


pLGIC 


2VL0 


-0.400, 


-0.500 
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(b) 



0.000 AH 

) 2.489 AM 

3 2.474 AM 

■0.021 2.473 A-B 

■0.092 2.458 AH 

0.125 2.452 AH 

■0.859 2.430 AH 

■1.007 2.429 AH 

■1.054 2.428 AH 

1.165 2.426 AH 

1.269 2.426 A- 
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2.073 2.420 A- 
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■2.214 2.416 A- 

-2.503 2.414 A— 

1 2 All A— 

2 2.408 A- 
■3.245 2.405 A- 
■4.048 2.397 A- 
■4.259 2.395 A- 




Supplementary figure S2. Rigid cluster decomposition graphs for: (a) BPTI 
(1BPI) (b) cytochrome-c (1HRC) (c) ad-antitrypsin (1QLP) (d) internal kinesin motor 
domain (1RY6) (e) yeast PDI (2B5E) and (f) pLGIC (2VL0). The x axis represents 
the protein backbone and the y axis the energy E cut , of the last hydrogen bond, 
which after being removed provokes a change in the rigidity distribution. Each line 
represents the new rigidity distribution of the polipeptide chain induced by removing a 
bond which alters the previous rigidity configuration. The residues belonging to rigid 
clusters are coloured — with the biggest rigid cluster coloured in red, whereas the 
flexible regions are shown as thin black lines. We choose the energy cutoffs defining 
the number of rigidity constraints using the RCD plots. For a detailed description of 
RCD graphs see Refs. [8j[9] 
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(a) 




(b) 




(c) 




Supplementary figure S3. Tertiary structure of BPTI (1BPI). Colouring is defined 



using the rigidigy analysis results shown in Figure S2 Flexible regions are illustrated 
in black whereas rigid residues are coloured as per the rigid cluster they belong to. The 
biggest rigid cluster is coloured in blue. The number and size of the rigid clusters vary 
depending on the chosen cutoff value, which for BPTI are (a) E cut = —0.2 kcal/mol, 
(b) E cut = — 1.7 kcal/mol and (c) E cut = —2.2 kcal/mol. Note that the colour code 
used to represent residues within the same rigid cluster is not the same in the RCD 
and in the tertiary structures. The biggest rigid cluster in the RCD graphs is noted in 
red and in the tertiary structures is noted in blue. 




Supplementary figure S4. Tertiary structure of cytochrome-c (1HRC). Colouring of 
the tertiary structure is defined as in Figure S3 but with cutoff energies (a) E cut — —0.7 
kcal/mol and (b) E cut = — 1.2 kcal/mol. 
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Supplementary figure S5. Tertiary structure of al-antitrypsin (1QLP). Colouring 
of the tertiary structure is defined as per Figure S3 but at cutoff energies of (a) 
^cut = — 0.1 kcal/mol, (b) E cut — — 0.5 kcal/mol and (c) E cut = — 1.1 kcal/mol. 




Supplementary figure S6. Tertiary structure of internal kinesin motor domain 
(1RY6). Colouring of the tertiary structure is defined as in Figure |S3| but with cutoff 
energies (a) E cut = —0.4 kcal/mol, (b) E cut = —0.6 kcal/mol and (c) E cut = —1.1 
kcal/mol. 
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Supplementary figure S7. Tertiary structure of yeast PDI (2B5E). Colouring of 
the tertiary structure is defined as in Figure [S3] but with cutoff energies (a) E cut = 
-0.015 kcal/mol, (b) E cut = -0.522 kcal/mol, (c) E cut = -0.885 kcal/mol and (d) 
^cut = —1.412 kcal/mol. Note that colors are assigned according to cluster size which 
changes depending on the cutoff energy. 
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(a) 




(b) 




Supplementary figure S8. Tertiary structure of pLGIC (2VL0). Colouring of the 
tertiary structure is defined as in Figure S3 but with cutoff energies (a) E cut = —0.4 
kcal/mol and (b) E cut = —0.5 kcal/mol. Note that the protein appears to be rigid for 
^cut = —0.4 kcal/mol and that there is a switch-like first order rigidity transition at 
^cut = — 0.5 kcal/mol which reveals the most flexible parts of the secondary structure 
which allow mobility. 
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Supplementary figure S9. Dot product graph for BPTI (IBPI). The dot product 
-m^ between an initial starting mode and its current mode , j = 7, . . . , 11 

(c) 

as the initial structure is projected along the initial mode. The current modes, m - , 
are obtained from performing normal mode analysis on the current conformations as 
the initial structure is projected along an initial mode For clarity dot products 

for only 25 conformations of each direction of motion are shown. The evolution of the 
dot product along the conformations is reported for different cutoff energies, which for 
BPTI are (a) E cut = -0.2 kcal/mol, (b) E cut = -1.7 kcal/mol and (c) E cut = -2.2 
kcal/mol. The horizontal and vertical dotted lines denote the largest possible value of 
mf* • mf* and the zero on the conformer axis, respectively. 
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Supplementary figure S10. Dot product graph for cytochrome-c (1HRC) as 
described in Figure S9 but with E cut values of (a) —0.7 kcal/mol and (b) —1.2 kcal/mol. 
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Supplementary figure Sll. Dot product graph for cd-antitrypsin (1QLP) 



as described in Figure S9 but with E cut values of (a) E cut = —0.1 kcal/mol, (b) 
^cut = — 0.5 kcal/mol and (c) E cut = — 1.1 kcal/mol. 
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Supplementary figure S12. Dot product graph for internal kinesin motor domain 
(1RY6) as described in Figure [S9| but with E cut values of (a) —0.4 kcal/mol (b) —0.6 
kcal/mol and (c) —1.1 kcal/mol. 
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Supplementary figure S13. Dot product graph for yeast PDI (2B5E) as described 



in Figure S9 but with E cut values of (a) E cut — —0.015 kcal/mol, (b) E cut = —0.522 
kcal/mol, (c) E cut = -0.885 kcal/mol and (d) E cut = -1.412 kcal/mol. 
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Supplementary figure S14. Dot product graph for a ligand gated ion channel 
protein (2VL0) as described in Figure [S9]3ut with E cut values of (a) E cut = —0.4 
kcal/mol and (b) at E cut = — 0.5 kcal/mol. 



